Predicting Passangers Who Survived or Not in Titanic Cruise

**Problem Statement**

The sinking of the Titanic is one of the most infamous shipwrecks in history.

On April 15, 1912, during her maiden voyage, the widely considered “unsinkable” RMS Titanic sank after colliding with an iceberg. Unfortunately, there weren’t enough lifeboats for everyone onboard, resulting in the death of 1502 out of 2224 passengers and crew.

While there was some element of luck involved in surviving, it seems some groups of people were more likely to survive than others.

In this challenge, we ask you to build a predictive model that answers the question: “what sorts of people were more likely to survive?” using passenger data (ie name, age, gender, socio-economic class, etc).

1. Import Library & Load Dataset

1.1. Import Library

1.2. Load Dataset

2. Exploratory Data Analysis (EDA)

2.1. Separate Numerical & Categorical Data

3. Data Pre Processing

3.1. Data Cleaning

3.1.1. Drop Irrelevant Features

3.1.2. Handle Missing Values

There are 3 features with the missing values, such as Age, Cabin, Embarked.

**Train Data**

No missing values for all features & zero values in Fare

**Test Data**

No missing values in test dataset

3.2. Feature Engineering

3.2.1. Log Transformation

3.2.2. Standardization

3.2.3. One Hot Encoding

4. Modelling & Prediction

4.1. Separate Feature & Target For Validation & Prediction

4.2. Cross Validation Score

The highest average cross validation score is 0.7919 & 0.0285 for standar deviation. The simpler algorithm the better, so I will use logistic regression.

4.3. Prediction